1. Introduction

Search Results containing a local pack often get the majority of clicks. Knowing which local rankings factors to optimize for the biggest bang is crucial for SEOs and business owners alike.

Smaller data studies as well as opinion-based surveys have sought to uncover the relevance and importance of local ranking factors*. https://www.localseoguide.com/guides/local-seo-ranking-factors/; https://moz.com/local-search-ranking-factors; https://www.brightlocal.com/research/how-car-dealerships-are-speeding-ahead-with-google-my-business/

However, in our view, most of the studies are outdated and contain severe statistical and methodological flaws.

This study intends to fill the gap and shed some insights on which local ranking factors are the most imnportant ones in the personal injury niche.

2. Methodology

  1. How accurately can the rankings be predicted given the dependent variables?
  2. What are the most important features for the predictions?
  3. What is the direction of the impact?

The statistical method of choice for this study was the gradient boosted decision trees (GBDT) model. The GBDT is a widely used machine learning technique which can be used in many settings. These can range from regression and classification to learning to rank type of problems. In a learning to rank problem, there is a ordered list of items and the goal for the model is to calculate a score for each item based on the dependent variables such that the original order is retained.

In process of building the model, data set was split to two folds: train data (containing around 70% of searches) and test data (the rest of the data, about 30%). GBDT model was fitted using training data, predictions were calculated for the test data set, and then finally predictions were compared to real observed rankings. The chosen evaluation metric was Spearman’s rank correlation coefficient. Spearman’s rank correlation is a scaled measurement of the agreement of two rankings. Perfectly matching rankings would give value of 1, the expected value for random rankings is zero and reverse order would have value of -1.

The next step is to understand why the model makes particular predictions; what are the most important dependent variables and how their values effect the predictions? For this purpose SHapley Additive exPlanations (SHAP) values were calculated. In SHAP each prediction is presented as a sum of each dependent variable’s responsibility. Then the overall impact of any particular variable can be measured as a average of absolute values over the whole data set.

Model results

A closer look at the features

The depended variables used in this study can be roughly organized into five main groups, these are listed below and also showing a few important variables suggested by SHAP values.

In terms of SEO, the first two categories are not much of a interest as they are something difficult or even impossible to change or adjust, but the last three are more interesting and worth further investigation.

Type category

Basic information about type categories
Type Value
Total unique categories 72
Missing type category 1.99%
Categories with more than >=10 results 26
Categories with more than >=100 results 13
Categories with more than >=1000 results 3
Median unique categories in one search 4
Min unique categories in one search 1
Max unique categories in one search 11

Key takeaways:

Title and description

Basic information about titles and descriptions
Type Title Description
Median character length (non missing) 24 534
Min character length (non missing) 4 8
Max character length (non missing) 125 752
Missing 0.01% 40.7%
Containing lawyer or attorney 22.65% 43.57%
Containing car accident or personal injury 5.31% 44.7%
Containing city name 5% 27.07%

Key takeaways:

Reviews

Basic information about reviews
Type Value
Median #reviews 14
Max #reviews 968
No reviews available 16.59%
Average rating 4.61
Response ratio by owners 33.43%
Average number of likes per review 0.66

Key takeaways:

Provided updates and number of photos

Basic information about #photos and Google updates
Type Value
Median #photos 5
Max #photos 540
Zero #photos 5.78%
Provides Google updates 54.79%